Developing Decision Trees for Handling Uncertain Data
نویسندگان
چکیده
Classification is one of the most efficient and widely used data mining technique. In classification, Decision trees can handle high dimensional data, and their representation is intuitive and generally easy to assimilate by humans. Decision trees handle the data whose values are certain. We extend such classifiers i.e, decision trees to handle uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution (pdf’s). Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted that show that the resulting classifiers are more accurate than those using value averages.
منابع مشابه
A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملDecision Trees for handling Uncertain Data to identify bank Frauds
Classification is a classical problem in machine learning and data mining. In traditional decision tree classification, a feature of a tuple is either categorical or numerical. The decision tree algorithms are used for classify the certain and numerical data for many applications. In existing system they implement the extended the model of decision tree classification to accommodate data tuple ...
متن کاملClassification of Categorical Uncertain Data Using Decision Tree
Certain data is a data whose values are known precisely whereas uncertain data means whose value are not known precisely. But data is always uncertain in real life applications. In data uncertainty attribute value is represented by a set of values. There are two types of attributes in data sets namely, numerical and categorical attributes. Data uncertainty can arise in both numerical and catego...
متن کاملHandling uncertain labels in multiclass problems using belief decision trees
This paper investigates the induction of decision trees based on the theory of belief functions. This framework allows to handle training examples whose labeling is uncertain or imprecise. A former proposal to build decision trees for twoclass problems is extended to multiple classes. The method consists in combining trees obtained from various two-class coarsenings of the initial frame.
متن کاملInterpretable Predictive Models for Knowledge Discovery from Home-Care Electronic Health Records
The purpose of this methodological study was to compare methods of developing predictive rules that are parsimonious and clinically interpretable from electronic health record (EHR) home visit data, contrasting logistic regression with three data mining classification models. We address three problems commonly encountered in EHRs: the value of including clinically important variables with littl...
متن کامل